Character-Position Arithmetic for Analogy Questions between Word Forms
نویسنده
چکیده
We show how to answer analogy questions A : B :: C : D of unknown D between word forms, by essentially relying on the basic arithmetic equality D[iB − iA + iC ] = B[iB ] − A[iA] + C[iC ] on characters and positions at the same time. We decompose the problem into two steps: specification and decoding. We examine several techniques to implement each of these two steps. We perform experiments on a set of positive and negative examples and assess the accuracy of combinations of techniques. We then evaluate the performance of the best combination of techniques on a large set of more than 40 million analogy questions from the training data of a shared task in morphology. We obtain the correct answer in 94 % of the cases.
منابع مشابه
What Analogies Reveal about Word Vectors and their Compositionality
Analogy completion via vector arithmetic has become a common means of demonstrating the compositionality of word embeddings. Previous work have shown that this strategy works more reliably for certain types of analogical word relationships than for others, but these studies have not offered a convincing account for why this is the case. We arrive at such an account through an experiment that ta...
متن کاملClustering Word Pairs to Answer Analogy Questions
We focus on answering word analogy questions by using clustering techniques. The increased performance in answering word similarity questions can have many possible applications, including question answering and information retrieval. We present an analysis of clustering algorithms’ performance on answering word similarity questions. This paper’s contributions can be summarized as: (i) casting ...
متن کاملTraversal-Free Word Vector Evaluation in Analogy Space
In this paper, we propose an alternative evaluating metric for word analogy questions (A to B is as C to D) in word vector evaluation. Different from the traditional method which predicts the fourth word by the given three, we measure the similarity directly on the “relations” of two pairs of given words, just as shifting the relation vectors into a new analogy space. Cosine and Euclidean dista...
متن کاملProbabilistic Model for Segmentation Based Word Recognition with Lexicon
The problem of off-line reading of unconstrained handwritten words has been studied extensively due to its role in many important applications such as reading addresses on mail-pieces [3, 6, 11], reading amounts on bank checks [7, 10], extracting census data on forms [2, 9], and reading address blocks on tax forms [12]. The main challenges are wide variety of writing styles, poor image quality ...
متن کاملLinguistic Regularities in Sparse and Explicit Word Representations
Recent work has shown that neuralembedded word representations capture many relational similarities, which can be recovered by means of vector arithmetic in the embedded space. We show that Mikolov et al.’s method of first adding and subtracting word vectors, and then searching for a word similar to the result, is equivalent to searching for a word that maximizes a linear combination of three p...
متن کامل